About
John P drew our attention to various species that were not shown as common in the NVC standards, but which we frequently find in our sites, notwithstanding they have been assessed as belonging to a community that “should” not have them at high frequency. Prominent among these is Stellaria graminea. The purpose of this notebook is to track the development of an application to find these unconforming species.
Start by loading the data
source("db_extract.R")
the_data <- GetTheData()
And we’re going to need species_frequency by assembly; but maybe could decide to go by community?
frequency_by_assembly <- FrequencyByAssembly(the_data)
frequency_by_community <- FrequencyByCommunity(the_data)
Let’s plot mean frequency by assembly vs frequency by community for all species:
fba <- frequency_by_assembly %>% select(assembly_id, species_name, community, freq)
fbc <- frequency_by_community %>% select(species_name,community, freq)
jf <- left_join(fbc, fba, by = c("community", "species_name")) %>% filter(species_name == "Stellaria_graminea" & grepl("MG", community))
f <- ggplot(jf, aes(freq.x, freq.y, colour = community)) +
geom_point()+
xlab("frequency by community") + ylab("frequency by assembly")+
geom_segment(aes(x = 0, xend = 1, y = 0, yend = 1, colour ="red"))
print(f)

This is for S. graminea and the MG communities. So (a) there are lots more assemblies than there are communities (of course) and (b) frequency by assembly tends to be larger than frequency by community; aggregating over community has smoothed out a lot of variation. My feeling is that it will be more illuminating to eliminate the assembly variable to start with, we can explore variability with assembly later.
Next question: how to compare with standard values. Explore select into table mg_rodwell in the meadows database … in which we have columns community, species_id (and name, but not to be relied on) and p_central, the central frequency of the categories I .. V.
Stellaria graminea id 139. It so happens that p_central == 0.1 for S. graminea for all MG communities: SELECT Community, species_id, p_central FROM meadows.mg_rodwell where Community like “MG%” && species_id = 139; ( I wonder whether I can communicate directly with the DB from here using SQL?).
Anyway, it seems quite possible to collect the p_central values for all species and plot against frequency by community
q <- "SELECT Community, species_id, p_central FROM meadows.mg_rodwell where Community like 'MG%';"
std_freqs <- query(q)
Decimal MySQL column 2 imported as numeric
and match up with species frequencies by community
rm(jf) # We don't need it any more
jf1 <- left_join(frequency_by_community, std_freqs, by = c("community"="Community", "species_id"))
Lots of NAs here; (a) because mires (M) are included in the survey data but not in the mg_rodwell table - yet - because as its name implies, it is just the MG communities. And (b), because it seems we have detected some species that the standards don’t list in those communities at all. So (a), filter out all the mires; (b) set the remaining NAs to zero.
# Would be easier if we'd excluded mires from the_data to start with!
jf2 <- jf1 %>% filter(grepl("MG", community)) %>% replace_na(list(p_central = 0))
rm(jf1)
So now we want filter to include only cases where CrI5 > p_central or CrI95 < p_centra, and then plot (survey) frequencies against standard frequencies (p_central)
jf3 <- jf2 %>% filter(CrI5 > p_central || CrI95 < p_central)
f2 <- ggplot(jf3, aes(p_central, freq, colour = community)) +
geom_jitter(aes(text = species_name), size = 3) + #, name = "species_name")+
xlab("Standard frequency") + ylab("Survey frequency")+
geom_segment(aes(x = 0, xend = 1, y = 0, yend = 1, colour ="red"))
Ignoring unknown aesthetics: text
ggplotly(f2)
# print(f2)
This gives a surprising number of species in strong disagrrement with the standard (we may ignore the big cluster near (0, 0) because of the way CrI5 is calculated - it can be very near zero, but not actually zero). Interactive plot: is this useful?
LS0tDQp0aXRsZTogIlN0ZWxsYXJpYSBub3RlYm9vayINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQojIEFib3V0DQoNCkpvaG4gUCBkcmV3IG91ciBhdHRlbnRpb24gdG8gdmFyaW91cyBzcGVjaWVzIHRoYXQgd2VyZSBub3Qgc2hvd24gYXMgY29tbW9uIGluIHRoZSBOVkMgc3RhbmRhcmRzLCBidXQgd2hpY2ggd2UgZnJlcXVlbnRseSBmaW5kIGluIG91ciBzaXRlcywgbm90d2l0aHN0YW5kaW5nIHRoZXkgaGF2ZSBiZWVuIGFzc2Vzc2VkIGFzIGJlbG9uZ2luZyB0byBhIGNvbW11bml0eSB0aGF0ICJzaG91bGQiIG5vdCBoYXZlIHRoZW0gYXQgaGlnaCBmcmVxdWVuY3kuIFByb21pbmVudCBhbW9uZyB0aGVzZSBpcyAqU3RlbGxhcmlhIGdyYW1pbmVhKi4gVGhlIHB1cnBvc2Ugb2YgdGhpcyBub3RlYm9vayBpcyB0byB0cmFjayB0aGUgZGV2ZWxvcG1lbnQgb2YgYW4gYXBwbGljYXRpb24gdG8gZmluZCB0aGVzZSB1bmNvbmZvcm1pbmcgc3BlY2llcy4NCg0KU3RhcnQgYnkgbG9hZGluZyB0aGUgZGF0YQ0KDQpgYGB7cn0NCnNvdXJjZSgiZGJfZXh0cmFjdC5SIikNCnRoZV9kYXRhIDwtIEdldFRoZURhdGEoKQ0KYGBgDQoNCkFuZCB3ZSdyZSBnb2luZyB0byBuZWVkIHNwZWNpZXNfZnJlcXVlbmN5IGJ5IGFzc2VtYmx5OyBidXQgbWF5YmUgY291bGQgZGVjaWRlIHRvIGdvIGJ5IGNvbW11bml0eT8NCmBgYHtyfQ0KZnJlcXVlbmN5X2J5X2Fzc2VtYmx5IDwtIEZyZXF1ZW5jeUJ5QXNzZW1ibHkodGhlX2RhdGEpDQpmcmVxdWVuY3lfYnlfY29tbXVuaXR5IDwtIEZyZXF1ZW5jeUJ5Q29tbXVuaXR5KHRoZV9kYXRhKQ0KYGBgDQpMZXQncyBwbG90IG1lYW4gZnJlcXVlbmN5IGJ5IGFzc2VtYmx5IHZzIGZyZXF1ZW5jeSBieSBjb21tdW5pdHkgZm9yIGFsbCBzcGVjaWVzOg0KYGBge3J9DQpmYmEgPC0gZnJlcXVlbmN5X2J5X2Fzc2VtYmx5ICU+JSBzZWxlY3QoYXNzZW1ibHlfaWQsIHNwZWNpZXNfbmFtZSwgY29tbXVuaXR5LCBmcmVxKQ0KZmJjIDwtIGZyZXF1ZW5jeV9ieV9jb21tdW5pdHkgJT4lIHNlbGVjdChzcGVjaWVzX25hbWUsY29tbXVuaXR5LCBmcmVxKQ0KamYgPC0gbGVmdF9qb2luKGZiYywgZmJhLCBieSA9IGMoImNvbW11bml0eSIsICJzcGVjaWVzX25hbWUiKSkgJT4lIGZpbHRlcihzcGVjaWVzX25hbWUgPT0gIlN0ZWxsYXJpYV9ncmFtaW5lYSIgJiBncmVwbCgiTUciLCBjb21tdW5pdHkpKQ0KZiA8LSBnZ3Bsb3QoamYsIGFlcyhmcmVxLngsIGZyZXEueSwgY29sb3VyID0gY29tbXVuaXR5KSkgKw0KICBnZW9tX3BvaW50KCkrDQogIHhsYWIoImZyZXF1ZW5jeSBieSBjb21tdW5pdHkiKSArIHlsYWIoImZyZXF1ZW5jeSBieSBhc3NlbWJseSIpKw0KICBnZW9tX3NlZ21lbnQoYWVzKHggPSAwLCB4ZW5kID0gMSwgeSA9IDAsIHllbmQgPSAxLCBjb2xvdXIgPSJyZWQiKSkNCnByaW50KGYpDQpgYGANCg0KDQpgYGB7cn0NCmBgYA0KVGhpcyBpcyBmb3IgKlMuIGdyYW1pbmVhKiBhbmQgdGhlIE1HIGNvbW11bml0aWVzLiBTbyAoYSkgdGhlcmUgYXJlIGxvdHMgbW9yZSBhc3NlbWJsaWVzIHRoYW4gdGhlcmUgYXJlIGNvbW11bml0aWVzIChvZiBjb3Vyc2UpIGFuZCAoYikgZnJlcXVlbmN5IGJ5IGFzc2VtYmx5IHRlbmRzIHRvIGJlIGxhcmdlciB0aGFuIGZyZXF1ZW5jeSBieSBjb21tdW5pdHk7IGFnZ3JlZ2F0aW5nIG92ZXIgY29tbXVuaXR5IGhhcyBzbW9vdGhlZCBvdXQgYSBsb3Qgb2YgdmFyaWF0aW9uLiBNeSBmZWVsaW5nIGlzIHRoYXQgaXQgd2lsbCBiZSBtb3JlIGlsbHVtaW5hdGluZyB0byBlbGltaW5hdGUgdGhlIGFzc2VtYmx5IHZhcmlhYmxlIHRvIHN0YXJ0IHdpdGgsIHdlIGNhbiBleHBsb3JlIHZhcmlhYmlsaXR5IHdpdGggYXNzZW1ibHkgbGF0ZXIuDQoNCk5leHQgcXVlc3Rpb246IGhvdyB0byBjb21wYXJlIHdpdGggc3RhbmRhcmQgdmFsdWVzLiBFeHBsb3JlIHNlbGVjdCBpbnRvIHRhYmxlIG1nX3JvZHdlbGwgaW4gdGhlIG1lYWRvd3MgZGF0YWJhc2UgLi4uIGluIHdoaWNoIHdlIGhhdmUgY29sdW1ucyBjb21tdW5pdHksIHNwZWNpZXNfaWQgKGFuZCBuYW1lLCBidXQgbm90IHRvIGJlIHJlbGllZCBvbikgYW5kIHBfY2VudHJhbCwgdGhlIGNlbnRyYWwgZnJlcXVlbmN5IG9mIHRoZSBjYXRlZ29yaWVzIEkgLi4gVi4NCg0KKlN0ZWxsYXJpYSBncmFtaW5lYSogaWQgMTM5LiBJdCBzbyBoYXBwZW5zIHRoYXQgcF9jZW50cmFsID09IDAuMSBmb3IgKlMuIGdyYW1pbmVhKiBmb3IgYWxsIE1HIGNvbW11bml0aWVzOg0KU0VMRUNUIENvbW11bml0eSwgc3BlY2llc19pZCwgcF9jZW50cmFsIEZST00gbWVhZG93cy5tZ19yb2R3ZWxsIHdoZXJlIENvbW11bml0eSBsaWtlICJNRyUiICYmIHNwZWNpZXNfaWQgPSAxMzk7DQooIEkgd29uZGVyIHdoZXRoZXIgSSBjYW4gY29tbXVuaWNhdGUgZGlyZWN0bHkgd2l0aCB0aGUgREIgZnJvbSBoZXJlIHVzaW5nIFNRTD8pLg0KDQpBbnl3YXksIGl0IHNlZW1zIHF1aXRlIHBvc3NpYmxlIHRvIGNvbGxlY3QgdGhlIHBfY2VudHJhbCB2YWx1ZXMgZm9yIGFsbCBzcGVjaWVzIGFuZCBwbG90IGFnYWluc3QgZnJlcXVlbmN5IGJ5IGNvbW11bml0eQ0KDQpgYGB7cn0NCnEgPC0gIlNFTEVDVCBDb21tdW5pdHksIHNwZWNpZXNfaWQsIHBfY2VudHJhbCBGUk9NIG1lYWRvd3MubWdfcm9kd2VsbCB3aGVyZSBDb21tdW5pdHkgbGlrZSAnTUclJzsiDQpzdGRfZnJlcXMgPC0gcXVlcnkocSkNCmBgYA0KYW5kIG1hdGNoIHVwIHdpdGggc3BlY2llcyBmcmVxdWVuY2llcyBieSBjb21tdW5pdHkNCmBgYHtyfQ0Kcm0oamYpICMgV2UgZG9uJ3QgbmVlZCBpdCBhbnkgbW9yZQ0KamYxIDwtIGxlZnRfam9pbihmcmVxdWVuY3lfYnlfY29tbXVuaXR5LCBzdGRfZnJlcXMsIGJ5ID0gYygiY29tbXVuaXR5Ij0iQ29tbXVuaXR5IiwgInNwZWNpZXNfaWQiKSkNCg0KYGBgDQoNCkxvdHMgb2YgTkFzIGhlcmU7IChhKSBiZWNhdXNlIG1pcmVzIChNKSBhcmUgaW5jbHVkZWQgaW4gdGhlIHN1cnZleSBkYXRhIGJ1dCBub3QgaW4gdGhlIG1nX3JvZHdlbGwgdGFibGUgLSB5ZXQgLSBiZWNhdXNlIGFzIGl0cyBuYW1lIGltcGxpZXMsIGl0IGlzIGp1c3QgdGhlIE1HIGNvbW11bml0aWVzLiBBbmQgKGIpLCBiZWNhdXNlIGl0IHNlZW1zIHdlIGhhdmUgZGV0ZWN0ZWQgc29tZSBzcGVjaWVzIHRoYXQgdGhlIHN0YW5kYXJkcyBkb24ndCBsaXN0IGluIHRob3NlIGNvbW11bml0aWVzIGF0IGFsbC4gU28gKGEpLCBmaWx0ZXIgb3V0IGFsbCB0aGUgbWlyZXM7IChiKSBzZXQgdGhlIHJlbWFpbmluZyBOQXMgdG8gemVyby4NCmBgYHtyfQ0KIyBXb3VsZCBiZSBlYXNpZXIgaWYgd2UnZCBleGNsdWRlZCBtaXJlcyBmcm9tIHRoZV9kYXRhIHRvIHN0YXJ0IHdpdGghIA0KamYyIDwtIGpmMSAlPiUgZmlsdGVyKGdyZXBsKCJNRyIsIGNvbW11bml0eSkpICU+JSByZXBsYWNlX25hKGxpc3QocF9jZW50cmFsID0gMCkpDQpybShqZjEpDQpgYGANClNvIG5vdyB3ZSB3YW50IGZpbHRlciB0byBpbmNsdWRlIG9ubHkgY2FzZXMgd2hlcmUgQ3JJNSA+IHBfY2VudHJhbCBvciBDckk5NSA8IHBfY2VudHJhLCBhbmQgdGhlbiBwbG90IChzdXJ2ZXkpIGZyZXF1ZW5jaWVzIGFnYWluc3Qgc3RhbmRhcmQgZnJlcXVlbmNpZXMgKHBfY2VudHJhbCkNCmBgYHtyfQ0KamYzIDwtIGpmMiAlPiUgZmlsdGVyKENySTUgPiBwX2NlbnRyYWwgfHwgQ3JJOTUgPCBwX2NlbnRyYWwpDQpmMiA8LSBnZ3Bsb3QoamYzLCBhZXMocF9jZW50cmFsLCBmcmVxLCBjb2xvdXIgPSBjb21tdW5pdHkpKSArDQogIGdlb21faml0dGVyKGFlcyh0ZXh0ID0gc3BlY2llc19uYW1lKSwgc2l6ZSA9IDMpICsgIywgbmFtZSA9ICJzcGVjaWVzX25hbWUiKSsNCiAgeGxhYigiU3RhbmRhcmQgZnJlcXVlbmN5IikgKyB5bGFiKCJTdXJ2ZXkgZnJlcXVlbmN5IikrDQogIGdlb21fc2VnbWVudChhZXMoeCA9IDAsIHhlbmQgPSAxLCB5ID0gMCwgeWVuZCA9IDEsIGNvbG91ciA9InJlZCIpKQ0KZ2dwbG90bHkoZjIpDQojIHByaW50KGYyKQ0KDQpgYGANClRoaXMgZ2l2ZXMgYSBzdXJwcmlzaW5nIG51bWJlciBvZiBzcGVjaWVzIGluIHN0cm9uZyBkaXNhZ3JyZW1lbnQgd2l0aCB0aGUgc3RhbmRhcmQgKHdlIG1heSBpZ25vcmUgdGhlIGJpZyBjbHVzdGVyIG5lYXIgKDAsIDApIGJlY2F1c2Ugb2YgdGhlIHdheSBDckk1IGlzIGNhbGN1bGF0ZWQgLSBpdCBjYW4gYmUgdmVyeSBuZWFyIHplcm8sIGJ1dCBub3QgYWN0dWFsbHkgemVybykuIEludGVyYWN0aXZlIHBsb3Q6IGlzIHRoaXMgdXNlZnVsPw0KDQoNCg==